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Abstract 

Lambek's production machines may be used to generate and recognize sen- 
tences in a subset of the language described by a production grammar. We deter- 
mine in this paper the subset of the language of a grammar generated and recog- 
nized by such machines. 

1 Introduction 

The focus of this paper is the mechanical generation and recognition of sentences from 
a production grammar [4, 8], which are known in mathematics as semi-Thue systems 
and in linguistics as rewriting systems or generative grammars. The latter, linguistics, 
is an important area of application for production grammars. They were used to study 
French and Latin conjugation [5, 6] and kinship terminology in English [7] and other 
languages [11, 1, 2, 3], Production grammars were also provided for subsets of English 
and French [10, 13] and used in a naive approach to syntactic translation [13]. 

To generate and recognize sentences in languages defined by a production gram- 
mar, Lambek combined two pushdown automata into a single machine [9] and gave 
examples of the execution of the machine on simple sentences taken from a grammar 
describing a subset of English. 

Our previous work [13] indicates that Lambek's production machines generate and 
recognize a subset of the language of a grammar — in other words, they do not generate 
or recognize sentences not in the language. This paper analyzes the machines in order 
to determine exactly which subsets of the language are generated and recognized. The 
sublanguage generated is generally a proper subset of the language, which we call 
the leftmost language. Correspondingly, the sublanguage recognized, also generally a 
proper subset of the language, may be seen as a dual to the leftmost language. 

"This paper is essentially the same as one that appeared in RAIRO Informatique Theorique et Applica- 
tions, 31(5), pp. 483^97, 1997. 
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2 Production grammars 



We review in this section the fundamental material needed in the paper. We assume the 
reader is acquainted with the theory of formal languages, so that only a short overview 
of the notation is necessary. 

A production grammar is a tuple Q = (V, Vi, Vt,V) where V (the vocabulary or 
alphabet) is a finite set, Vi and Vt (the initial and terminal vocabularies) are subsets of 
V, and V (the productions) is a finite or at least recursive set of pairs (T, A) with T and 
A strings of elements of V. We usually represent an element (r, A) of V as T — ► A. 
An element of V t is called a terminal symbol, while an element of V — Vt is called a 
nonterminal symbol. A string of elements of V will typically be denoted by a greek 
letter, and individual elements of V by capital roman letter. 

From any production grammar Q = (V, Vi, Vt,P) one obtains the dual grammar 
of g by taking g- 1 = (V, V t , Vi, V^ 1 ) where V^ 1 is the set of all pairs (A, T) such 
that (r, A) e V. 

A production T — ► A is applicable to a string a of element of V if a is of the form 
a\Ta2- The application of Y — ► A to a is the string a±Aa2- A production Y — ► A 
is leftmost applicable to a string a if a is of the form o^Yo-i and for any production 
T' — > A', if a is of the form 71^72, then |T < |T'| and |oi| < |7i|. 

We define the leftmost reduction relation on strings of elements of V as follows: let 
(Ti — ► (T2 if a production of g is leftmost applicable to o\ and er 2 is the application 
of the production to o\. A sentence is a string of terminal symbols in Vt- The leftmost 
language of a grammar g is the set of all sentences that can be derived via — ► starting 
from symbols in Vi. If we define a reduction relation using the notion of applicability 
instead of leftmost applicability, the set of sentences that can be derived is called the 
language of the grammar. For emphasis, we sometimes refer to the language as the full 
language of the grammar. It is clear that the leftmost language of a grammar is a subset 
of the full language. The following grammar shows that the inclusion may be proper: 

S — > ABC 
AB — > x 
BC — > y 

C — > z 

A — ► w 

The full language of this grammar is {xz, wy}, and the leftmost language is {wy}. 

We assume in this paper that all grammars under consideration are well-formed, in 
the sense that all reduction sequences ultimately lead to sentences — string of terminal 
symbols. This among other things implies that there is at least one production for each 
initial symbol in Vi- We shall also assume, as it is usually done, that there is no empty 
production and that no terminal appears on the left side of a production. 

Let us now present three transformations one needs to perform on a grammar Q 
to make it suitable for treatment by the machine we introduce in the next section. A 
requirement of the transformations is that they preserve the leftmost language of the 
untransformed grammar. 
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The first transformation takes a grammar Q with initial vocabulary V, and produces 
a new grammar Q' with a unique initial symbol, say S (this symbol must be a new sym- 
bol not originally in V). The transformation simply consists of adding a new production 
for every initial symbol of Q. For example, if V, = {A,B,C}, we add the productions 

S — ► A 
S — ► B 
S — > C 

and let the new initial vocabulary be V, = {S}. It is clear that the leftmost language 
of Q is preserved by this transformation. The second transformation is the process of 
normalization. A production T — ► A is called normal if both Y and A have length 
1 or 2. A normal grammar is a grammar in which every production is normal. Nor- 
malization produces a normal grammar from a grammar, while preserving the leftmost 
language of the grammar. The transformation consists in iterating the following pro- 
duction replacements (the symbol N is always taken to be a new symbol not in V at 
every production replacement): 

r — > ABA => r — > NA 

N — > AB 
ABT — > A =► NT — ► A 

AB — >N 

For the last production replacement, the same symbol N must be used for all produc- 
tions with the same left side, e.g. ABT. To see why the leftmost language of the original 
grammar is preserved, consider the two cases that arise: if Y — ► ABA is leftmost ap- 
plicable, so is T — > NA, and once applied, by leftmost reduction and since no other 
production may involve the newly introduced symbol N, the next production to apply 
must be N — > AB; similarly, if ABT — ► A is leftmost applicable, so is AB — > N, 
and once applied, the leftmost applicable productions include NT — ► A (again, since 
the newly introduced symbol N cannot appear in other productions not of the form 
NT — ► ...). 

The next transformation we consider isolates the generation of terminal symbols 
into their own production. Assuming the grammar under consideration is normal, iter- 
ate the following productions replacement (the symbols N,Ni,N 2 are taken to be new 
symbols not in V for every replacement, and the symbols t,ti,t 2 are taken to be terminal 
symbols): 

T y At => Y — ► AN 

N — >t 
T — y tA => Y — > NA 

N — >t 
T — >ht 2 Y — > NiN 2 

Ni — y ti 

N 2 — > t 2 
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Figure 1 : Production machine 



It is clear that this transformation preserves the leftmost language of the original gram- 
mar. 

Please note that the first transformation applied to a grammar Q has the same effect 
as the last transformation when one considers the dual grammar <? _1 , namely to isolate 
the production of the (then terminal) symbol S. 

The last transformation has the following interesting (and useful) consequence: 

Lemma 2.1 Given Q a grammar to which the last transformation above has been ap- 
plied. If a terminal symbol is produced after leftmost applications of productions, then 
every symbol to the left of that terminal symbol will also be a terminal symbol. 

Proof: By the last transformation applied to the given grammar, since a 
terminal is produced, then the leftmost applicable production must have 
been of the form N — > t with t the produced terminal symbol. Assume 
that there are nonterminals to the left of that terminal. Since no new non- 
terminal has been introduced, no terminal may be used on the left of a 
production, and the grammar is assumed to be well-formed, there must 
exist a production applicable to nonterminals on the left of the terminal. 
But this contradicts the fact that the production N — > t was leftmost. □ 



3 Production machines 

Lambek describes in [9] a machine that allows us to generate and recognize sentences 
from a production grammar. A production machine [9, 10] corresponds roughly to a 
combinaison of two pushdown automata. It consists of three potentially infinite tapes 
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subdivided into squares. The middle tape is the input/ouput tape, the top and bottom 
tapes are storage tapes. Only one square in each taped is scanned at any given point 
in time. The two storage tapes can move in either direction, whereas the input/output 
tape moves only from right to left. The tapes are positioned so that all three scanned 
squares are aligned (see Figure 1). 

Seven moves are defined for production machines, parametrized by a given gram- 
mar Q. The moves involve the scanned squares of the tapes: 



c 



(A) 



C 

(A) 



left 

B 



B 

(A) 



right 
stay 
stay 



B 

A 



stay 
stay 
left 



B 

(A) 



(D) 
C 
right 



if(A)B — >C(D) is in 7? 



TefT 
left 

stay 



if D G V t 



(A) 



stay 
left 
stay 



The (•) notation indicates that the scanned square may or may not be empty, and 
represents an empty square. A mention of "left", "right", "stay" means that the 
corresponding tape should be moved left, right or stay in the current position. We use 

the expression "move i — ► via production P" to explicitely state which production is 
involved in the move. 

The machine may be used either to generate sentences from the grammar or to 
recognize sentences in the grammar. Those two activities involve different subsets of 
the general moves presented above, and different starting and ending states for the 
machine. We will therefore speak of production machines as though there were two 
types of machines: the generative machine M g {Q) corresponding to a grammar Q and 
the recognitive machine M r (G) corresponding to a grammar Q. 



5 



The generative machine of Q has the following initial and terminal states: 



Initial: 



Terminal: <sentence> 



The machine is defined with respect to the grammar Q, and the moves that should be 
attempted in order are the following: 5, 6, 1, 2, 3, 4. We say that a sentence a is 
producible by M g (G) if the machine starts in the initial state and ends up in a state 



a 




The recognitive machine of Q has the following initial and terminal states: 
Initial: 



< sentence > 



Terminal: 



The machine is defined with respect to the dual grammar Q^ 1 and the moves that 
should be attempted in order are the following: 5, 7, 1, 2, 3, 4. We say that a sentence 
a is recognizable by M r (G) if it ends in the terminal state after starting in a state 



We refer the reader to [9] for sample executions of the machine to generate and 
recognize sentences in a simple grammar for the English language. 

One look at the moves of a production machine shows that the machine is funda- 
mentally nondeterministic. Indeed, move i — ► is used in a nondeterministic way if more 
than one production with a left side of (A)B is present in the grammar. For a genera- 
tive production machine, this allows the machine to generate different sentences. For a 
recognitive machine, this introduces a complexity: possibly only one nondeterministic 
choice of production to apply next leads to the terminating state of the machine, as 
some examples in [9] show. Hence, a recognitive production machine must consider 
concurrently all the possible applications of move 5 and terminate when one leads to 
the terminating state. A sentence a is therefore recognizable if one of the concurrent 

consideration of an application of move of the recognitive production machine 
reaches the terminal state. 



4 Generation 

We analyze in this section the generative production machine M g {Q) of a given gram- 
mar Q. We show that the language generated by M g {Q) is exactly the leftmost lan- 
guage of Q: a sentence a is producible by M g {Q) if and only if a is in the leftmost 
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language of Q. Without loss of generality, we may assume that the grammar Q under 
consideration is a normal grammar with a unique initial symbol S and with a unique 
production corresponding to the generation of every terminal symbol. As we saw ear- 
lier, any grammar may be transformed into such a grammar defining the same leftmost 
language. 

The idea underlying the proof is straightforward. Given a grammar Q and a genera- 
tive production machine M g (Q), we show that the graph corresponding to the leftmost 
reduction relation is isomorphic to a graph corresponding to the moves of the machines. 
Therefore, a string in the leftmost language of Q obtained by leftmost reductions may 
be generated by the machine following the moves specified by the isomorphism, and 
vice-versa. 

The main operational tool we use is a transition graph. Given a set D, a subset 
I of D and a non-transitive relation < over D, define a family of subset of D by the 
equations 

So = I 
S n +i = {b : a < b for some a 6 S n } 

The transition graph of < generated by / is the graph with nodes in U%L S n and an 
edge between a, b 6 U^LqSVi if and only if a < b. Define a layer of the transition graph 
T over < generated by / to be the set of all element of the graph at a certain distance 
of an element of the initial subset, layi(T) — {a : 3ao, . . . , fti_i G T such that ao € 
I and ao < ■ ■ ■ < a^_i < a}. If T is defined by the above equations for So and S n+ i, 
it is not hard to see that lay^T) = Si. 

For a given grammar Q with initial symbol S, the leftmost reduction relation over 
strings in V* lead to the transition graph of — ► generated by {S }, which we will denote 
by C It is this transition graph that we will show is isomorphic to a transition graph 
derived from the moves of the generative machine. 

Taking the i — ► relation over the states of the machine also leads to a transition 
graph, but it is easily seen to be much larger than the transition graph C, since for 
every production application (which corresponds to a move i — >), there are other ad- 
ministrative moves that the machine needs to perform. However, the key consideraton 
is the following: all the moves the machine makes are deterministic, except for move 

5 

i — since there might be many applicable productions at that point. If the grammar is 
well-formed, the following lemma is easily seen to hold: 

Lemma 4.1 (Determinacy) Given a state s of M g (Q) which allows a move i — > to a 
state s'. There exists unique states and moves 

5 i mi m 2 rrifc 

s i — > s i — > si i — ► • • • i — y s k 
such that mi, ... , rrik ^ 5 and state Sk allows either no moves or a move \ — k 

We define a reduction relation i — ► between states of A4 g (Q) that allow either a 

5 5 

move i — ► or no move at all: in the statement of the above lemma, if s i — ► s via 
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production P, we say that s i — > via production P. This is well-defined (by the 
above lemma) and can be seen as a collapse of the i — > transitions. The following 
result is a reformulation of lemma 4. 1 : 

Corollary 4.2 Given s a state ofA4 g (Q). If s i — > si via production P and s i — > S2 
v/a production P, then S\ = S2- 



Let T be the transition graph of i — > generated by the machine state 



We 



now show that C is isomorphic to T. Let us first define a mapping between strings 
of elements of V and states of M g (Q). This function will be the isomorphism we are 
looking for. 

Definition 4.3 Given a grammar Q — (V, Vi, Vt,V), and a a string of elements ofV. 
Suppose a is of the form t\ . . .t p n\ . . .n q P\P2m\ . . .m r , where ti, . . . ,t p are prefixing 
terminal symbols, m, . . . ,n q ,P\,Pi,m\, . . . ,m r are nonterminal symbols and the left- 
most applicable production ofV to a, if any, is of the form P\P^ — > ••• (Pi might be 
empty). Define the function F by 



F(a) 



or (if no production is applicable to a) 



F(a) 



The symbols Pi (if any) andPi are said to be in application position. 

Lemma 4.4 F is injective. 

Proof: Given a, a' G C Assume F(a) = F(a'). Then a =t\ . . X p a\ 
and a' =ti . . X v a\, with a\, a[ strings of nonterminals, if no symbols are 
in application position, then by the definition of F both er, <r' are strings 
of terminals, and by the above a — a'. If Pi and P 2 are in application 
position (Pi might be empty), then o\ = (T2P1P2O3 an d a '\ = 



and again by the definition of F, a 2 
F is injective. 



and (T 3 = (T3. Thus a = a and 



□ 



c 

Lemma 4.5 Given a, a' G C, then a — > a' implies F(a) \ — > F(a'). 

Proof: Given ct, a' G C. Assume is of the form rAi • • A„. Four cases 
arise, depending on the form of the production applicable to a (there must 
be one). 
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1. Ai — ► t with t a terminal symbol, and cr' is of the form 

TtA 2 ■ • • A„ 

2. AiA 2 — ► t with t a terminal symbol, and cr' is of the form 

TtA 3 ■ • ■ A„ 

3. Afe — ► T for some k, and cr' is of the form 

tAi • • • A fe _irA fe+ i • • • A„ 

4. AfeA/j+i — > T for some fc, and cr' is of the form 

tAi • • • A fc _irA fe+2 ■ • • A„ 

It is straightforward to show that in all those cases, F(a) ^ F(a r ). □ 



c 

Lemma 4.6 Given cr, cr' G £, fften F(cr) i — > F(a') implies a — ► cr'. 

Proof: Assume F(a) i — ► F(er') via production T — > A. By definition 
of F, r — > A is leftmost applicable to a. Let a — ► cr" via production 
T — > A. By lemma 4.5, F(cr) i-^ F(a") via production T — > A. By 
corollary 4.2, F(cr') = F(a"), and by lemma 4.4, cr' = cr" and thus 
cr — ► a'. 



Lemma 4.7 = T. 

Proof: We show by induction on i that Mi F(layi{C)) = layi(T), which 
clearly implies the statement of the lemma. 



The base case of the induction is trivial, since F(S) = 



For the induction step, we first show F(layi+\(C)) C iat/,+i(T). Given 
cr e layi+i(C). Thus, there exists a a' G layi(C) such that cr' — ► cr. By 

the induction hypothesis, F(cr') c iay,(T). By lemma 4.5, F(cr') i — > 
F(cr), and by definition of transition graph T, G lay i+ i(T). 

We next show layi + \{T) C F(layi + \(C)). Let s G lay i+ i(T). Thus 
there exists a s G layi(T) with s i — > s via production T — > A. By the 
induction hypothesis, there exists a cr' G layi(C) such that F(cr') = s'. 

Let cr be the application of T — > A to cr'. By lemma 4.5, F(a') i — > 

F(cr), and thus s' F(cr). By corollary 4.2, F(a) = s and thus s G 
F(lay i+ i(£j). This completes the induction and the proof. □ 
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Lemma 4.8 F is an isomorphism of graphs from C to T. 

Proof: By lemmas 4.4 and 4.7, F is a bijective function from C to T. By 
lemmas 4.5 and 4.6, F is a transition graph isomorphism. □ 

This isomorphism implies the following result for the generative version of the 
production machine for a given grammar Q. 

Proposition 4.9 Given a grammar Q, a sentence a is producible by A4 g (Q) if and only 
if a is in the leftmost language ofQ. 

Proof: (=>) Given a =ti • • -t„ a string in the leftmost language of Q. Thus 
there exists a chain in C from S, the initial symbol of Q, to a representing 
the leftmost reductions derivation of a. By the isomorphism of lemma 4.8, 
there exists a chain in T 



Since 



s 




<b 






| 1 


s 





and extending (uniquely, by lemma 4.1) the i — ► transitions, we get a se- 
quence of machine moves 



T 



and thus a is producible by M g {Q). 

(<=) Given a =ti • • -t„ a string producible by M g {G). There exists ma- 
chine moves 



s 


| 1 ; 





5 ; | 6 ^ 










s 





















Starting from 



and collapsing the i — ► transitions into i — > transi- 



tions, we get a chain in T. By the isomorphism of lemma 4.8, we get a 
chain in C 

S ► ><T 



and thus a is in the leftmost language of Q. 



□ 
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5 Recognition 



Fundamentally, the recognitive machine M r (G) is similar to the generative one: it de- 
fines essentially the same moves (except that the move produceing terminals is replaced 
by a move that accept the next symbol from the input/output tape), and it uses the dual 
of the grammar under consideration. 

One may again derive an isomorphism in the manner described in the previous 
section, connecting the moves of the recognitive machine to the leftmost reduction 
relation defined on the dual of the grammar. One needs to extend the definition of 
transition graphs to use strings of terminals as the initial set. The extension is fairly 
trivial, and is left as an exercise. 

The language generated by M g (G) is the leftmost language of Q, the one obtained 
by allowing only leftmost reductions. Correspondingly, the language recognized by 
M. r (Q) is a dual to the leftmost language, characterized as those sentences that can be 
recognized via leftmost reductions in the dual grammar. 

It is clear that the recognized language is a subset of the full language of the gram- 
mar. The following grammar shows that the recognized language is in general a proper 
subset of the full language, and need not be equal to the generated language: 

S — ► AG 

F — > C 
G > BC 

E — ► AB 
BC > z 
A — > x 

The full language generated by this grammar is {xz}. The leftmost language of this 
grammar is also {xz}. However, trying to recognize the string xz via leftmost reduc- 
tions in the dual grammar leads to a unique derivation 

xz — > Az — ► ABC — ► EC — ► EF 

and thus the string is not recognized by the machine. 

6 Conclusion 

We provide in this paper an analysis of the production machines described by Lam- 
bek in [9, 10]. We determine the subset of the full language of a grammar that is 
both generated and recognized by the machines. The generated language corresponds 
to the subset of the full language one obtains by applying leftmost reductions, and is 
in general a proper subset of the full language. Conversely, the recognized language 
corresponds to the subset of the full language one obtains by applying leftmost reduc- 
tions in the dual grammar, and is also in general a proper subset of the full language. 
Moreover, the generated and recognized language need not agree. 
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The generative version of production machines can in fact be reguarded as imple- 
menting a generalized version of a Markov algorithm [12, 14]. A Markov algorithm 
on a production grammar Q consists of repeatedly applying a leftmost applicable pro- 
duction to a string, and if more than one production is leftmost applicable, the first 
production (given an ordering of the productions) is applied. As such, the algorithm 
is fully deterministic. In contrast, while a generative production machine also applies 
leftmost applicable productions, the choice of which production to apply if more than 
one is applicable is non-deterministic. 

Let us mention a possible extension of the description of the production machines 
that would allow for the generation and recognition of the full language. Recognition is 
the easiest to extend: when the machine verifies all the possible choices of production 
in parallel when a move is applicable, one adds the parallel choice of not applying 
any production, and passing on to the next possible move of the machine. One can 
extend generation in the same way, by adding a nondeterministic choice of not applying 
a move when it is possible to do so. This extension has a caveat: generation may 
fail to produce a sentence. 

An important class of grammars do not satisfy the criteria set forth for generation 
and recognition via production grammars: translation grammars, which take strings 
of initial symbols as initial states. For example, the initial symbols could be words 
of English, and terminal symbols words in French, and the grammar would translate 
English into French. The production machines presented in this paper can be modified 
easily to handle such grammars. 
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